NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Aligning Language Models with Demonstrated Feedback

Shaikh, Omar; Lam, Michelle S; Hejna, Joey; Shao, Yijia; Cho, Hyundong; Bernstein, Michael S; Yang, Diyi (April 2025, International Conference on Learning Representations (ICLR 2025))

Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large datasets for new ad-hoc tasks. We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number (< 10) of demonstrations as feedback. Our method, Demonstration ITerated Task Optimization (DITTO), directly aligns language model outputs to a user's demonstrated behaviors. Derived using ideas from online imitation learning, DITTO cheaply generates online comparison data by treating users' demonstrations as preferred over output from the LLM and its intermediate checkpoints. Concretely, DITTO operates by having an LLM generate examples that are presumed to be inferior to expert demonstrations. The method iteratively constructs pairwise preference relationships between these LLM-generated samples and expert demonstrations, potentially including comparisons between different training checkpoints. These constructed preference pairs are then used to train the model using a preference optimization algorithm (e.g. DPO). We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts. Additionally, we conduct a user study soliciting a range of demonstrations from participants (N = 16). Across our benchmarks and user study, we find that win-rates for DITTO outperform few-shot prompting, supervised fine-tuning, and other self-play methods by an avg. of 19% points. By using demonstrations as feedback directly, DITTO offers a novel method for effective customization of LLMs.
more » « less
Free, publicly-accessible full text available April 25, 2026
Can Large Language Models Transform Computational Social Science?

https://doi.org/10.1162/coli_a_00502

Ziems, Caleb; Held, William; Shaikh, Omar; Chen, Jiaao; Zhang, Zhehao; Yang, Diyi (January 2024, Computational Linguistics)

Abstract Large language models (LLMs) are capable of successfully performing many language processing tasks zero-shot (without training data). If zero-shot LLMs can also reliably classify and explain social phenomena like persuasiveness and political ideology, then LLMs could augment the computational social science (CSS) pipeline in important ways. This work provides a road map for using LLMs as CSS tools. Towards this end, we contribute a set of prompting best practices and an extensive evaluation pipeline to measure the zero-shot performance of 13 language models on 25 representative English CSS benchmarks. On taxonomic labeling tasks (classification), LLMs fail to outperform the best fine-tuned models but still achieve fair levels of agreement with humans. On free-form coding tasks (generation), LLMs produce explanations that often exceed the quality of crowdworkers’ gold references. We conclude that the performance of today’s LLMs can augment the CSS research pipeline in two ways: (1) serving as zero-shot data annotators on human annotation teams, and (2) bootstrapping challenging creative generation tasks (e.g., explaining the underlying attributes of a text). In summary, LLMs are posed to meaningfully participate in social science analysis in partnership with humans.
more » « less
Full Text Available
Reliability of electric vehicle charging infrastructure: A cross-lingual deep learning approach

https://doi.org/10.1016/j.commtr.2023.100095

Liu, Yifan; Francis, Azell; Hollauer, Catharina; Lawson, M. Cade; Shaikh, Omar; Cotsman, Ashley; Bhardwaj, Khushi; Banboukian, Aline; Li, Mimi; Webb, Anne; et al (December 2023, Communications in Transportation Research)

Full Text Available
NeuroCartography: Scalable Automatic Visual Summarization of Concepts in Deep Neural Networks

https://doi.org/10.1109/TVCG.2021.3114858

Park, Haekyu; Das, Nilaksh; Duggal, Rahul; Wright, Austin P.; Shaikh, Omar; Hohman, Fred; Polo Chau, Duen Horng (January 2022, IEEE Transactions on Visualization and Computer Graphics)

Full Text Available
EnergyVis: Interactively Tracking and Exploring Energy Consumption for ML Models

https://doi.org/10.1145/3411763.3451780

Shaikh, Omar; Saad-Falcon, Jon; Wright, Austin P; Das, Nilaksh; Freitas, Scott; Asensio, Omar; Chau, Duen Horng (January 2021, 2021 CHI Conference on Human Factors in Computing Systems)
Kitamura, Yoshifumi; Quigley, Aaron; Isbister, Katherine; Igarashi, Takeo (Ed.)
The advent of larger machine learning (ML) models have improved state-of-the-art (SOTA) performance in various modeling tasks, ranging from computer vision to natural language. As ML models continue increasing in size, so does their respective energy consumption and computational requirements. However, the methods for tracking, reporting, and comparing energy consumption remain limited. We present EnergyVis, an interactive energy consumption tracker for ML models. Consisting of multiple coordinated views, EnergyVis enables researchers to interactively track, visualize and compare model energy consumption across key energy consumption and carbon footprint metrics (kWh and CO2), helping users explore alternative deployment locations and hardware that may reduce carbon footprints. EnergyVis aims to raise awareness concerning computational sustainability by interactively highlighting excessive energy usage during model training; and by providing alternative training options to reduce energy usage.
more » « less
Full Text Available

Search for: All records